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ABSTRACT 


This  paper  makes  two  important  contributions  to  the  theory  of  bandwidth 
selection  for  kernel  density  estimators  under  right  censorship.  First,  an 
asynqptotic  representation  of  the  integrated  squared  error  into  easily 
understood  variance  and  squared  bias  components  is  given.  Second,  it  is  shown 
that  if  the  bandwidth  is  chosen  by  the  data-based  method  of  least  squares 
cross-validation,  then  it  is  asymptotically  optimal  in  a  compelling  sense.  A 
by-product  of  the  first  part  is  an  interesting  comparison  of  the  two  most 
popular  kernel  estimators. 


1.  INTRODUCTION 


Kernel-type  estimators  of  an  unknown  probability  density  function  from 
right-censored  data  have  been  studied  recently  by  several  authors  (e.g.  Blum 
and  Susarla,  1980;  Diehl  and  Stute,  1985;  Foldes,  Rejto  and  Winter,  1981; 
McNichols  and  Padgett,  1985;  and  Stute,  1985).  Padgett  and  McNichols  (1984) 
gave  a  review  of  available  results  on  kernel  density  estimation  from  censored 
data.  The  details  of  the  forms  of  these  estimators  are  in  section  2. 

As  in  the  complete  sample  (i.e.  uncensored)  case,  the  choice  of  the 
smoothing  parameter,  or  bandwidth,  is  crucial  to  the  effective  performance  of 
the  estimator.  Intuitively,  if  the  bandwidth  is  too  small,  there  is  too  much 
"variance"  in  the  sense  that  features  which  belong  only  to  the  particular  data 
set,  and  not  to  the  underlying  density,  may  be  seen  in  the  estimate.  If  the 
bandwidth  is  too  large,  there  is  too  much  "bias"  in  the  sense  that  features  of 
the  density  are  smoothed  away. 

In  the  complete  sample  case,  an  elegant  mathematical  quantification  of  the 
above  intuition  may  be  found  in  Rosenblatt  (1956),  Parzen  (1962),  Watson  and 
Leadbetter  (1963),  and  Rosenblatt  (1971).  In  particular,  they  show  that  the 
Mean  Integrated  Squared  Error  (MISE)  has  an  asymptotic  decomposition  as  a 
simple  variance  term,  a  simple  squared  bias  term,  and  some  negligible  terms. 

In  section  3  it  is  seen  how  this  type  of  decomposition  may  be  done  in  the  case 
of  randomly  right-censored  data.  Along  the  way,  approximations  are  found  for 
the  two  most  popular  censored-data  kernel  estimators  which  give  insight  into  _ 


exactly  how  they  are  related. 

While  this  asymptotic  representation  of  MISE  provides  considerable 
insight,  it  is  not  very  useful  for  selecting  the  bandwidth  because  the 
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minimi zer  of  the  two  dominant  terms  contains  quantities  which  are  harder  to 
estimate  than  f  itself.  As  this  is  also  true  in  the  complete  sample  case, 
there  has  recently  been  considerable  work  done  there  on  data-based  bandwidth 
selectors.  One  of  the  most  promising  methods  is  least  squares  cross- 
validation,  introduced  by  Rudemo  (1982)  and  Bowman  (1984).  The  bandwidth 
selected  in  this  way  has  been  shown  to  be  asymptotically  optimal  under  various 
conditions  by  Hall  (1983),  Stone  (1984),  Burman  (1985),  Hall  (1985),  and  Marron 
(1985).  Deeper  asymptotic  properties  are  established  in  Hall  and  Marron 
( 1985a, b). 

In  section  4,  it  is  shewn  that  least  squares  cross-validation  is  also 
effective  in  the  case  of  right-censored  data.  In  particular,  asymptotic 
optimality,  in  the  same  sense  as  for  the  complete  sample  case,  is  established. 
Section  5  contains  the  proofs.  Finally,  a  practical  method  for  choosing 
between  the  two  different  common  kernel  estimators  is  suggested. 

2 .  THE  ESTIMATORS 

The  two  best  known  kernel  density  estimators  are  based  on  estimates  of 
distribution  functions.  In  the  censored  data  case,  a  widely  used  distribution 
function  estimator  is  defined  as  follows. 

Let  X°,..,X°  denote  the  i.i.d.  survival  times  of  n  items  or 
individuals  that  are  censored  on  the  right  by  i.i.d.  random  variables 

Ux . Un  which  are  independent  of  the  X?'s.  Denote  the  common  distribution 

function  of  the  X?'s  by  F°  and  that  of  the  IL 's  by  H.  Let  H*  -  1-H.  It 
is  assumed  that  F°  is  absolutely  continuous  with  density  f°  and  that  H  is 
continuous. 

The  observed  randomly  right-censored  data  are  denoted  by  the  pairs 
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(X^,A^),  i-l,...,n,  where 

*i  ■  “<>  ‘i  -  1|xi  <  Uj]  • 

with  lj.j  denoting  the  indicator  random  variable  of  the  event  {•]. 

Based  on  (X^A^),  i-l,...,n,  a  popular  estimator  of  the  survival 
function  1-F°(t)  is  the  product-limit  (PL)  estimator,  proposed  by  Kaplan  and 
Meier  (1958)  and  shown  to  be  "self-consistent"  by  Efron  (1967).  Let 
(Z^,A^) ,  i-l,...,n,  denote  the  ordered  X^s  along  with  their  corresponding 
A^'s.  The  PL  estimator  of  1-F°(t)  is  defined  by 

*Z1 


Pn(fc) 


1, 

0  <  t 

k-1  .  A. 

n  (rSr) 

i-i  n_1+1 

zk-l ' 

o. 

t  >  z, 

n 


Denote  the  PL  estimator  of  F  (t)  by  Fn(t)  -  1-Pn(t),  and  let  s^  denote  the 


jump  of  v  (or  F  )  at  Z,,  that  is 
n  n  j 


sj- 


Vzj>  *  VV'  j-2 . n-1 


W' 


]-n. 


Then  for  j  <  n,  s^  -  0  if  and  only  if  Aj  -  0,  that  is,  Z^  is  a  censored 

observation.  For  various  properties  of  the  PL  estimator,  see  Breslow  and 

»  » 

Crowley  (1974),  Csorgo  and  Horvath  (1983),  Foldes  and  Rejtd  (1981),  Foldes, 
Rejto  and  Winter  (1980),  Gill  (1983),  and  Wellner  (1982),  among  others. 

A 

The  distribution  function  estimator,  Fn,  is  very  naturally  used  to 
construct  a  density  estimator  by  defining 


fn(x)  "  h_1  £.  K(lT)dVx) 

.  n  x-Z . 

-  h-i  Z  s  .K(— rJ-) . 
j-l  3  h 

This  estimator  has  been  studied  by  Foldes,  Rejto  and  Winter  (1981),  McNichols 
and  Padgett  (1986),  Diehl  and  Stute  (1985),  and  Stute  (1985). 

An  alternative  kernel  estimator  has  been  proposed  by  Blum  and  Susarla 
(1980),  extending  the  results  of  Rosenblatt  (1976)  to  censored  data.  It  is 
motivated  by  the  fact  that  a  reasonable  (and  technically  easy  to  handle)  esti¬ 
mate  of  f°(x)H*(x)  is  given  by 

_  *  ,  n  x-X . 

(f°H  )  (x)  «  (nhf1  z  K(ir1)1fa  -11  ' 

j-l  j 

Hence,  it  makes  sense  to  estimate  f°(x)  by  (f°H*)n(x)  divided  by  an  estimate 
of  H* ( x ) .  If  we  reverse  the  intuitive  roles  played  by  X?  and  IL,  then  the 
product-limit  estimator  for  H*  is  given  by 

'1,  0  <  t  <  zx 

Hn(t)  *  '  i“1  (i5l+r)  1  '  Zk-1  <  t  -  V  k"2"'"n 

>0,  t  >  Z^. 

This  does  not  make  a  good  denominator  because  it  takes  on  the  value  zero,  so 

a 

Blum  and  Susarla  propose  changing  H  slightly  to 


Hence,  define 


<<*>  -  [^(xir1.^  *<^>1,^,  • 


To  get  some  idea  for  what  the  relationship  is  between  the  estimators  f, 


n 


and  fn,  note  that  from  Susarla,  Tsai,  and  Van  Ryzin  (1984)  for  each  j, 


sj  -  V’VV1 


-i 


Hence,  we  may  write: 


fn(x) 


n 

Z 


A. 


x-X. 


j-1  nHn(x.)h 


K(  h  >' 


(2.1) 


n 


fn(x)  E 


A. 


x-X. 


j-1  nHn(x)h 


K(  h:> 


(2.2) 


Since  Hn  and  Hn  are  essentially  the  same,  the  only  significant  difference 


between  the  estimators  is  the  argument  of  the  estimate  of  H  .  It  will  be  seen 
in  the  next  section  that  the  difference  is  typically  not  negligible. 

It  will  be  assumed  throughout  that  K  is  a  probability  density  with 
contact  support  and  that  K  is  Hdlder  continuous.  In  addition,  h  •+  0  and 


nh  -»  •  as  n  -»  ».  Letting  TQ  ■  sup{t:  G(t)  <  1}  for  a  distribution  function 


rO  * 


G,  it  is  assumed  that  TH  <  Tpo  <  •  and  that  f  H  is  Hfflder  continuous  of 
order  c  >  0. 

3.  ASYMPTOTIC  REPRESENTATION 


The  main  idea  of  this  section  is  that  f  (x)  and  f_(x)  are  essentially 

n  n 


the  same  as 


n 


V*1  -  E 


A. 

_JL 


x-X. 


j-1  nH  (Xj )h 


K(-j^), 


(3.1) 


!nU> 


n 

Z 


A. 


x-X. 


j-1  nH  (x)h 


«  h1' 


respectively,  because  the  convergence  of  H  and  H  to  H  is  faster  (~n  ) 


6 


than  that  of  the  density  estimators  (often  -  n~2//^).  Essentially,  the  same 

a 

idea  has  been  used  by  Diehl  and  Stute  (1985)  and  Stute  (1985).  For  f  equal 
to  any  of  fn,  fn,  fn,  or  £*,  we  choose  to  analyze  its  performance  by  studying 
the  Integrated  Squared  Error,  ISE(f)  -  JgtfU)  -  f°(x) ]2w(x)dx,  where  w(x) 
is  a  nonnegative  weight  function. 

There  are  two  major  reasons  for  working  with  ISE  instead  of  with  its 
expected  value,  MISE.  First,  ISE  is  a  more  compelling  error  criterion  because 

A 

it  assesses  how  well  f  is  doing  for  the  data  set  at  hand,  instead  of  only  for 
the  average  over  all  possible  data  sets  as  is  done  by  MISE.  Second,  ISE  is 
more  natural  for  the  automatic  bandwidth  selection  results  of  the  next  section. 
It  should  be  pointed  out  that  by  using  methods  slightly  easier  than  those  used 
here,  all  of  our  results  can  be  formulated  in  terms  of  MISE.  Also,  there  is  an 
obvious  extension  of  the  theorems  of  this  section  to  the  pointwise  convergence 
of  the  estimators  when  it  is  assessed  by  the  Mean  Square  Error. 

The  role  of  the  weight  function,  w,  is  to  eliminate  endpoint  effects. 
Assume  in  particular  that  w  is  bounded  and  supported  on  [0,T],  where 
T  <  TH. 

_ 1  +  £  —  p 

The  statement  of  the  theorem  will  be  uniform  over  h  c  [n  ,n  J, 
some  t  >  0.  This  is  necessary  for  the  automatic  bandwidth  selection  results 
of  section  4. 

Theorem  3.1.  Under  the  conditions  on  K,  f°H*,  F°,  f°,  and  H  stated  in 

_1+C  _ £ 

section  2,  for  h  e  (n  ,n  J,  we  have 


sup 

h 


ISE(fn(x))-tan_1h_1+b] 

an-1h-1+b 

ISE(f*(x) )-[an_1h-1+b*] 


-»  0 


an”*h~^+b* 


3-  •  S  •  / 


•  9 


(3.2) 


sup 

h 


-»  0  a.s 


a  -  emu  ^), 

H 

* 

and  where  b,  b  are  defined  by 

b  -  X  B(x,h)2w(x)dx, 


b*  -  J  B*(x,h?2  -*(x)  j  dx, 

[H  (x)p 

B(x,h)  -  J  K(u) t f°(x-hu)-f°(x) ]du, 

B*(x,h)  -  /  K(u) [f°(x-hu)l!*(x-hu)-f0(x)H*(x)  ]du. 


Remark  3.1.  Note  that  an  immediate  consequence  of  Theorem  3.1  is  the  ISE 

* 

consistency  of  f  and  fn. 

Remark  3.2.  The  only  difference  in  the  asymptotic  representations  of  ISE 
shows  up  in  the  bias  part.  Note  that  for  some  choices  of  f°  and  H*,  b 
will  be  smaller,  while  for  other  choices,  b*  will  be  smaller.  Hence,  the 
estimators  fR  and  f*  are  really  not  comparable  from  this  representation. 
However,  note  that,  by  an  addition-subtraction, 

-  I  K(u)f°(x-hu)  ku-hul-HVlL  *  B(x,h>. 

H  (x)  H  (x) 


So  in  a  weak  sense,  f  has  an  extra  "noise  term”,  which  may  make  f  slightly 

n  n 

preferable. 

Rates  of  convergence  may  be  computed  in  the  usual  manner  of  Rosenblatt  and 
Parzen.  Further,  Theorem  3.1  yields  an  asymptotic  bandwidth  which  is  optimal 
in  the  same  sense  as  the  bandwidths  of  Rosenblatt  and  Parzen  except  that  the 
random  error  criterion  ISE  is  used  in  place  of  its  mean.  This  is  given  in  the 
next  remark. 

Remark  3.3.  (i)  It  is  well  known  in  the  complete  sample  case  that  by  allowing 

K  to  take  on  negative  values,  a  faster  rate  of  convergence  can  be  obtained. 
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Theorem  3.1  demonstrates  that  the  same  is  true  here.  In  particular,  suppose 

fi.  j-o. 


f  x^K(x)dx  -  J 


(3.3) 


0,  j-l,...,k-l, 
k  K,  j-k, 

(for  k  >  2,  this  violates  the  assumptions  of  Theorem  3.1;  however  it  is 


straightforward,  but  space-consuming,  to  modify  the  proofs  to  allow  for  this) 


If  we  assume  that  f°  and  f°H  have  k  uniformly  continuous  derivatives, 


then 


,  v2k,K  .2  ,f ,  -o. (k) .2  ,  /u2k. 

b  -  h  (j^j-)  l((f  )  I  wdx  +  o(h  ), 


,2k.K  v2rl/ro  *v(k).2  w  ,  /v2kv 
h  (5-7)  J[(f  H  )'  ']  — — ,dx  +  o(h  ). 

(H  r 


Hence,  for  the  estimator  fn,  the  "classical  optimal  bandwidth"  has  the  form 


(;k2)(J—  ) 


H 


L  (£r)2[/((f0)(k))2w] 


l/( 2k+l ) 


n 


-l/(  2k+l ) 


and  the  rate  of  convergence  is  ISE  ~  n  2k/(2k+l)^  uere  ^he  following 


remarks,  there  are  obvious  analogues  for  the  estimator  fn» 


To  see  how  Theorem  3.1  implies  that  hQ  behaves  like  the  optimal  band¬ 
width  of  Rosenblatt  and  Parzen  (the  complete  sample  case),  define 


itt  -lu~l f  c,2 , ,  rf°w, . ,_2k ,  k  .2  rr ,  co.  (k)  ,2 
EIQ  -  n  h  [JK  ) [  J — j-]+h  (ry)  J[(f  )v  ']  w. 


H 


By  (3.2),  with  obvious  notation. 


sup 

h 


ISE(fn,h)-EI0(h) 


El0(h) 


-*  0  a.s. 


Let  h^  denote  the  minimizer  of  ISE(fn,h)  and  recall  that  hp  is  the 
minimizer  of  Elg(h).  Then  from  the  inequalities  ISE(fn,hg)  >  ISEff^h^), 
and  EIqUt^)  >  Elg(hQ) ,  it  follows  that 
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ISE(£n,h0)-ISE(£n,V| 

ISE(f„,h0)-EI0(ho) 

EW 

ISE(£„,h0)  S 

EVV 

ISE(fn,ho) 

Hence , 


isE(fn.y-Ew 

EW 


ISE(fn,h0) 

inf  lSE(f  ,h) 
h  n 


■+  0 


->  1 


vv 

ISE(W 


a.s. 


3  •  S  •  f 


which  shows  that  h^  is  optimal  in  the  same  sense  as  the  bandwidths  of 
Rosenblatt  and  Parzen,  except  for  the  fact  that  the  random  ISE  criterion  is 


used  in  place  of  its  mean. 


Remark  3.3.  (ii)  If  we  keep  the  assumption  (3.3),  but  suppose  f°  has  p  <  k 
derivatives  (p  need  not  be  an  integer  by  putting  a  Holder  condition  of  order 
p-[p]  on  the  [p]-th  derivative,  where  ( • ]  denotes  the  greatest  integer  less 
than  or  equal  to  p),  then  it  can  be  shown  that  b*  <  C  h2p  for  some  positive 
constant  C.  Hence,  by  taking  h  -  n-V(2pfl)^  well-known  (see,  for 
example,  Bretagnolle  and  Huber  (1979))  "optimal  rate,"  ISE  -  n“2P/(  2EH'1 ) , 
can  be  obtained  for  our  censored  data  problem. 


4.  AUTOMATIC  BANEWIDTH  SELECTION 

For  data-based  bandwidth  selection,  we  propose  least-squares  cross-valida¬ 
tion,  which  was  invented,  for  complete  sample  density  estimators,  by  Rudemo 
(1982)  and  Bowman  (1984).  This  is  motivated  as  follows.  Let  f  denote 

either  f  or  f*.  Since  the  third  term  of 
n  n 

ISE(f)  -  J  f2w  -  2 I  ff°w  +  /( f°)2w 

is  independent  of  h,  we  would  like  to  choose  h  to  minimize  the  sum  of  the 
first  two  terms.  The  first  term  is  known.  The  integral  of  the  second  term  cam 


[•] 


be  unbiasedly  estimated  by 

,  n  *  w(X. ) 

r*  I  ^(X.)  1,  ..  , 

i-1  1  1  H(X.)  1  i  11 

n  i 

A  A 

where  is  the  "leave-one-out"  version  of  f,  given  by 

x-x . 

f _  j  ( x )  m  E  - — T -  K(— Al.... 

j*i  (n-l)Hn(X.)h  h  [Aj  11 

a 

when  f  is  f  ,  and  by 
n 

*  1  x-X . 

f  A*)  “  E  - nr — K( — h  )if A  11 

j*i  (n-l)H  (x)h  h  IAj  1] 

when  f  is  fn«  Thus,  we  define  hc  to  be  the  minimizer  of  the  least-squares 

cross-validation  criterion 

*  -  . n  *  w(X. ) 

CV(h)  -  J  [f(x)]w(x)dx  -  2n-i  I  f  ■  (X.  )  -;--1  1  ... 

i-1  1  1  H*(X.)  [V1] 

A 

Theorem  4.1.  Under  the  conditions  of  Theorem  3.1,  h_  is  asymptotically 
optimal  in  the  sense  that 


ISE(f,h  ) 

_ V _ 

A 

inf  ISE(f,h) 
h 


-»  1  a.s. 


Remark  4.1.  Theorem  4.1  says  that  hc  is  optimal  under  either  of  the 
assumptions  stated  in  Remark  3.3  (i)  or  Remark  3.3  (ii).  This  generalizes 
the  important  asymptotic  optimality  results  of  Hall  (1983),  Stone  (1984), 
Burman  (1985),  Hall  (1985),  and  Marron  (1985)  to  the  case  of  censored  data. 

Remark  4.2.  The  fact  that  CV(h)  essentially  provides  an  estimate  of 

A  if 

ISE(f,h)  suggests  a  practical  method  of  choosing  between  ffi  and  fn*  In 

A  A  ^ 

particular,  if  CV(h)  for  f  -  fR  is  smaller  than  CV(h)  for  f  -  fn,  then 
the  estimator  f  should  be  used,  as  its  ISE  will  probably  be  smaller. 
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5.  PROOFS  OF  THEOREMS 

All  proofs  are  given  for  the  estimator  fn(x),  as  it  will  be  obvious  how 
to  adapt  them  to  handle  f*(x).  T*le  ay1®130!  c  will  be  used  for  a  generic 
constant.  Note  first  that,  using  the  notation  (3.1),  by  adding  and  subtracting 
In(x), 

ISE(fn)  -  ISE(?n)  +  II  +  III,  (5.1) 

where 

II  -  2  Jq  (fn(x)  -  f°(x)](fn(x)-ln(x)]  w(x)dx, 

III  -  J*  [fn(x)-^n(x)]2  w(x)dx. 

Proof  of  Theorem  3.1.  We  analyze  each  of  the  terms  ISE(ln),  II,  and  III 
separately.  First,  by  a  "variance-bias  squared"  decomposition,  and  standard 
computations  of  the  type  in  Rosenblatt  (1971), 

MISE(?n)  -  E(ISE(?n))  -  v  +  b,  (5.2) 

where 

v  -  n-1h-1(  Jk2)  ( I  &)  +  o(n"1h~1),  (5.3) 

H 

and  where  b  is  defined  in  section  3.  The  fact  that  ISE(?  )  behaves  like 

n 

MISE(?n>  is  contained  in  the  following  lemma. 

ISE(?  )-MISE(f  ) 

Lemma  1.  sup  - -*  0  a.s. 

h  MISE(I  ) 

n 

The  fact  that  term  III  is  negligible  is  contained  in 

-+  0. 

It  follows  from  the  Schwartz  Inequality,  Lemma  1,  and  Lemma  2  that  III 
may  be  replaced  by  II  in  the  statement  of  Lemma  2. 

This  last  fact,  together  with  (5.1),  (5.2),  (5.3),  Lemma  1,  and  Lemma  2 


complete  the  proof  of  Theorem  3.1. 

Before  proving  Theorem  4.1,  we  give  the  proof  of  Lemmas  1  and  2. 

Proof  of  Lemma  1.  Let  N  -  #(A^-1).  For  \*»l,...,n,  conditioning  on 
{X^:  A^-l}  is  a  set  of  v  i.i.d.  random  variables  with  density  f°H*/p» 
where 

p  -  jQf°(x)H*(x)dx. 

Let  E^  denote  expectation  under  this  conditional  distribution.  The  method  of 
the  proof  of  Theorem  1  of  Marron  and  Hardle  (1986)  shows  that,  under  the  stated 
assumptions,  for  k-1,2,...,  there  exist  constants  C  >  0  and  r  >  0  so  that 


’  ISE(?)-E (ISE(I))  12k  . 

sup  E  - - - - - —  <  Cv  . 

h  V  E  (ISE(1  )) 


(5.4) 


To  analyze  E^(ISE(ln)),  note  first  that 


E^n(x)  -  f°(x)  - 


$  !  -±-  «*£> 

H  (y)h  h 


dy  -  fu(x) 


-  ;  K(u)[^  f°(x-hu)  -  f°(x)  ]du 


where  B(x,h)  was  defined  in  section  3.  Next  note  that 


14  v  i-1  nH  (X.  )h 

x-x. 

-  \  var At1—  K(-iT)3 
n2  ^  H  (x.)h  h 


-  n-1h_1  ( JK2 )  +  o(^-  n-1h-1 ) 

np  H*(x)  np 


Thus,  by  a  "variance-bias  squared"  decomposition, 


EjISZil))  -  v  +  b  . 


vv-^v  +  o(H£n  h  >' 
for  v  defined  in  (5.3),  and  where 

\  -  *  21^)1^  -1)  S~  B(x,h)f°<x)w<x)dx 

+  — i)2/^  f°(x)2w(x)ax. 


for  b  as  in  section  3.  Hence, 


Ev(ISE(?n))  -  MISE(?n)  +  -l)v  +  o(^  n-1h-1 ) 

+  ((^)2-Db  +  2  ^  (Hp  -1)  J0  B(x,h)f°(x)w(x)dx 
+  (^•-1)2Jg  f°(x)2w(x)dx. 

Now  for  small  x  >  0  and  for  n-1,2,3,...,  restrict  attention  to  v 

between  np-n*5+T  and  np  +  n Js+T.  For  such  v,  —  <  2  and 

np  ~ 

for  a  constant  C  .  It  follows  from  (5.2)  and  (5.3)  that,  for  a  different 

value  of  C,  and  for  n  sufficiently  large, 

inf  MISE(?  )  >  C  n“1+e.  (5.5) 

h  n 

Hence,  for  small  x,  large  n,  and  another  C, 


Ev(ISE(ln))-MISE(In) 

MISE(?_) 

n 


Thus,  for  such  v,  from  (5.4), 


-e+2x 


sup  E 


ISE(I)-MISE(?J  12k 
_ n _ n 

mise(5  ) 
n 


<  C  n 


Now,  let  I*n  be  a  subset  of  (n  ,  n  ]  so  that  successive  members  of 
rn  are  separated  by  a  distance  less  than  or  equal  to  n”p  and  so  that 
#(r_ )  <  np  for  some  p  >  0.  Then,  using  obvious  notation, 


P[sup 

h 


ISE(*n,h)-MISE(In/h) 

MISE(ln,h) 


>  e] 


sup 

er 


n 


ISE(*n,h)-MlSE(?n,h) 

MISE(In,h) 


>! 


+  p 


,  sup 
L  |h-h']<n  p 


ISE(?n,h)-MISE(ln/h)  ISE(?n,h' )-MISE(?n,h' ) 


MISE(?n,h) 


MISE(ln,h») 


-  E  (^)pV(l-p)n-VP 
v*0 


* 

ISE(In,h)-MISE(In,h) 

.  € 

sup 

.  her 
n 

>  2 

MlSE(ln,h) 

where  the  last  equality  comes  from  a  continuity  argument  and  the  assumptions 
that  K  is  HSlder  continuous  and  has  compact  support.  Letting 
An,x  “  [np  -  n*+T,  np  +  n*+T], 


■ 

ISE(£n,h)-MISE(*n,h) 

• 

sup 
.  h 

>  e 

MISE(?n,h) 

<  E  (>V(l-p)n  \ 
veAn,x 


* 

ISE(*n,h)-MISE(?n,h) 

.  e 

sup 
.  h 

>  2 

MISE(In,h) 

+  E  (")pV(l-p)n_V 

*Vx 


<  E  (^)pVU-p)nV  E  P 
veA_  _  her 


ISE(1  ,h)-MISE(?  ,h) 
_ n _ n 

MISE(I  ,h) 
n 


>! 


+  2*(-n  ) 


K)|n 


„  „  .  „  „  ,  ,v  r  ISE(I  ,h)-MISE(fn,h)  12k 

$  I  ("(P^d-P)"  Vnp(|)2ksup  Ev  - 2 - 2 - 

veA  h  L  MISE(?  ,h) 

n,x  n 


<  C  npn"rk+  2*(-nT) , 


+  2*(-nT) 


(5.6). 


where  *  denotes  the  standard  normal  cdf.  But,  for  k  sufficiently  large, 
the  first  term  on  the  right  side  of  (5.6)  is  summable  on  n,  and,  since  the 
second  term  is  also  summable  on  n,  the  proof  of  Lemma  1  is  complete. 

Proof  of  Lemma  2.  Using  the  assumption  on  the  support  of  w,  and  using  the 
compactness  of  the  support  of  K,  observe  that  for  n  sufficiently  large. 


•  n  1  11  x— ■ 

sup  III  -  sup  J0  [  Z  (t - - - )  ^  K(— ^ 

h  h  u  i-1  H(X.)  H(X.)  m  n 

n  l  l 


))nhK(  h1)1[A.-l]l2w(x)dx 


<  (  sup  -sr1 - J —  ) ( sup  J®[(f°H*)  (x)]2w(x)dx), 

te[0,T']  Hn(t)  H  (t)  h 

o  * 

where  T'  •  (T+T„)/2,  and  where  (f  H  )  was  defined  in  section  2.  Lemma  2 
H  n 

is  now  a  consequence  of  the  results  of  Csorgcf  auid  Horvath  (1983)  together 


with  (5.5)  and  the  fact  that  there  is  a  constant  C  so  that 


sup  Jq  [ (f°H*)n(x) ]2w(x)dx  <  C  a.s. 
h 


(5.7) 


o  * 

To  verify  (5.7),  note  that  by  adding  and  subtracting  f  (x)H  (x). 


where 


Jq  ((f°H*)n(x)]2  w(x)dx  -  U  +  V  +  W, 


0-  jj  [(f°H*L  ~  f°H*]2  w(x)dx, 


V  -  2J  [ ( f  H  )n  -  f  H  ] [ f  H  ]w(x)dx, 

W  -  /  [f°H*]2w(x)dx. 

Now  W  is  deterministic  and  independent  of  h.  An  argument  similar  to  (but 
slightly  easier  than)  that  used  above  on  lSE(In)  gives 

sup  U  -*  0  a.s. 
h 

An  application  of  the .Schwartz  inequality  to  V  yields  (5.7),  which  completes 
the  proof  of  Lemma  2. 

Proof  of  Theorem  4.1.  Here  again,  only  the  proof  in  the  slightly  harder  case 
* 

of  f  -  fn  is  given.  We  note  that  by  a  computation  similar  to  that  used  to 
verify  Remark  3.3  (i).  Theorem  4.1  follows  from  (3.2)  and  the  result  that 


|CV(h)-lSE(f  ,h)-[CV(h' )-ISE( f  »h' ) ] | 

sup  - 2 - 

h,h'  MlSE(fn,h)  +  MlSE(fn,h') 

To  prove  (5.8)  it  is  enough  to  show  that 

.  n  f°(X.  )w(X. ) 


•+  0  a.s.  (5.8) 


|CV(h)-iSE(fn,h)-2[n  1  I  H»(X  ) *  1ra  -11  "  /(f°(x))2w(x)dx] I 


MISE(fn,h) 


This  may  be  rewritten  as 


|2n  1(n-l)"1  I  I  U.  .| 
i-1  1?ti  13 


MISE(f.h) 


h  '  H* ( X . ) 


,£VrA',iiV?fr.v11v.v.^ 


f  (X.)w(X.)  Q  2 

H*(X.  ) - 1[a._i]  +  Hf  (x)I  W(x)dx 


defining 


Uf  .  +  Z. 

13  ij' 


_i  Vxi  w(xi) 

Uij  "  h  K(  h  }  H*(Xj)H*(Xi)  1tAi-l,Aj-U 

_  fa-ln# X~Xj)  f°(x)w(x)  ,  1 

Jh  K(  h  '  H£(X_.)  ™  1[Aj-l] 

f°(X.)w(X.)  Q  2 

"  H*(X.) -  1[A.-1]  +  J[f  (X)1  W(x)dx 


f  .  x.-x. 
2ij  -  lh'  "'T1' 


W(X.)  , 

iFuTT  1[A.-i,Aj-l] 


-  f°(x. 


(X.)w(X.)l[Ai_1]  -  H*(Xi) 


Theorem  2.2  then  follows  from  the  two  lemmas: 


Lemma  3. 


Lemma  4. 


-1  -1 

n  (n-1)  I  I  U!. 

sup - H  i*1  ■>  0 

h  MISE( f ,h) 


-1  -1 

n  (n-1)  x  Z  Z  Z.  . 

Sup  - — lr  -  -»  0 

h  MISE( f ,h) 


Proof  of  Lemma  3.  This  proof  combines  the  ideas  of  Lemma  2  of  Marron  (1985) 
with  those  of  the  proof  of  Lemma  1  above.  Recall  that  in  the  proof  of  Lemma  1, 
the  notation  E^  meant  expected  value  taken  over  (X^:  A^-l} ,  conditioned  on 
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the  event  {N-v}.  The  censored  observations  {X^:  A^-0}  were  ignored  in  the 
definition  of  since  they  did  not  appear  in  the  quantities  being  analyzed. 
The  censored  X^'s  do  appear  in  the  following,  so  it  will  be  understood  that 
denotes  expected  value  as  above,  only  also  conditioned  on  {X^;  A.-0)  (or, 
equivalently,  E^  denotes  integration  over  {X^:  A^-l},  which  are  i.i.d 
random  variables  with  density  f°H*/p)* 

For  v  -  l,...,n,  U! j  -  UVj  +  z where 

_i  Vxi  W(V 

Uij  M  h  ;  H*(Xj)H*(Xi)  i[A.-l,Aj-l] 


-  k  'h~lK(-rt)  »p>  £°(x)dx  "uj-n 

f°(X.  )w(X.  f  , 

-  -h.(x1TJ-  +  k  !  tf0(Kll  “u)dx 


,^-D  [jh-V-j)  gpf  £°(x)dx  l,vl] 


-  I  [f°(x)]2w(x)dx]. 


Using  the  method  of  proof  of  Lemma  2  of  Marron  (1985),  it  can  be  shown  that, 
for  k-1,2,...,  and  n  sufficiently  large. 


sup  E 


-1  -1 
n  (n-1)  1  Z 


Z  ZU”.  12k 

i-1  J*  13 


MISE( f ,h) 


<  C  n 


regardless  of  the  realization  of  {X,,:  A^-0}.  In  a  similar  manner  (i.e., 
approximate  H*(x)  by  H*(x),  including  another  ^  -  1  term,  and  using  the 
cumulant-style  argument  of  Marron  (1985)),  we  can  obtain 


-1  n 

’  n  Z  Z'.  12k 

i-1  -rk 

sup  E  - *-=* -  <  C  n  T  . 

h  MISE( f ,h)  . 
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These  two  inequalities  may  now  be  used  in  a  computation  similar  to  that  yield¬ 
ing  (5.6)  in  the  proof  of  Lemma  1  to  finish  the  proof  of  Lemma  3. 


Proof  of  Lemma  4.  Write 


|n  1(n-l)  1  Z  Z  Z.  | 
i  j*i  J 


ln  ?  Ifni(Xi)_f  (Xi,JI  H*(X.)  H*(Xi) I1[Ai-lJw(Xi) 


<  {n-1  I  [fni(Xi)-f°(Xi)]2  lu._1]w(X.)},s 


x  {n"1  Z  [  1 


- - - 12  1  w(X  ))** 

.  H*(X.)  H*(X.)J  1tA.-l]WlAi,j  ' 


(5.9) 


The  expression  inside  the  first  square-root  on  the  right-hand  side  of  (5.9)  is 
the  "leave-one-out"  version  of  the  average  squared  error  and  will  be  denoted 


by  ASE(fni).  Using  the  methods  of  Lemma  1  of  Marron  (1985)  and  Theorem  2  of 


Marron  and  Hardle  (1986),  it  can  be  shown  that  for  k-1,2,...  there  is  a 
constant  C  so  that 


ASE(f  . )-MISE(f  ,h) 


E 


ni 


n' 


MlSE(fn,h) 


2k 


<  C  n 


-rk 


The  proof  of  Lemma  4  is  then  completed  by  a  computation  like  that  leading  to 
(5.6)  in  the  proof  of  Lemma  1,  which  includes  the  uniform  convergence  result 
for  the  product-limit  estimator  H*  used  in  the  proof  of  Lemma  2. 
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